Orchestration and scheduling of services

ABSTRACT

This document relates to orchestration and scheduling of services. One example method involves obtaining dependency information for an application. The dependency information can represent data dependencies between individual services of the application. The example method can also involve identifying runtime characteristics of the individual services and performing automated orchestration of the individual services into one or more application processes based at least on the dependency information and the runtime characteristics.

BACKGROUND

In many cases, software applications are expected to exhibit low-latencyprocessing. For instance, a cloud application might be expected torespond to a user query within 500 milliseconds. Meeting these latencyexpectations becomes more difficult as applications grow more complex.

SUMMARY

This Summary is provided to introduce a selection of concepts in asimplified form that are further described below in the DetailedDescription. This Summary is not intended to identify key features oressential features of the claimed subject matter, nor is it intended tobe used to limit the scope of the claimed subject matter.

The description generally relates to techniques for orchestration and/orscheduling of services. One example includes a method or technique thatcan be performed on a computing device. The method or technique caninclude obtaining dependency information for an application. Thedependency information can represent data dependencies betweenindividual services of the application. The example method or techniquecan also include identifying runtime characteristics of the individualservices and performing automated orchestration of the individualservices into one or more application processes based at least on thedependency information and the runtime characteristics.

Another example includes a method or technique that can be performed ona computing device. The method or technique can include evaluatingexecution logs for an application having a plurality of services toidentify different critical paths corresponding to multiple executionsof the application. The method or technique can also include identifyinga statistical critical path for the application based at least onfrequency of occurrence of the different critical paths in the executionlogs and scheduling individual services of the application based atleast on whether the individual service occur on the statisticalcritical path.

Another example includes a system having a first computing clusterconfigured to execute a first application process, a second computingcluster configured to execute a second application process, and acomputing device configured to execute an orchestrator. The orchestratorcan be configured to obtain dependency information reflectingdependencies between a plurality of services of an application andobtain runtime information representing runtime characteristics ofindividual services. The orchestrator can also be configured to performorchestration of the individual services into the first process on thefirst computing cluster and the second process on the second computingcluster based at least on the dependency information and the runtimecharacteristics.

The above listed examples are intended to provide a quick reference toaid the reader and are not intended to define the scope of the conceptsdescribed herein.

BRIEF DESCRIPTION OF THE DRAWINGS

The Detailed Description is described with reference to the accompanyingfigures. In the figures, the left-most digit(s) of a reference numberidentifies the figure in which the reference number first appears. Theuse of similar reference numbers in different instances in thedescription and the figures may indicate similar or identical items.

FIG. 1 illustrates an example processing flow for orchestrating anapplication, consistent with some implementations of the presentconcepts.

FIGS. 2 and 3 illustrate example orchestrations of an application,consistent with some implementations of the present concepts.

FIGS. 4-6 illustrate examples of execution instances of an application,consistent with some implementations of the present concepts.

FIG. 7 illustrates an example of a new service being inserted into anapplication process, consistent with some implementations of the presentconcepts.

FIG. 8 illustrates an example processing flow for scheduling anapplication, consistent with some implementations of the presentconcepts.

FIG. 9 illustrates example scheduling priorities for differentexecutions of an application, consistent with some implementations ofthe present concepts.

FIG. 10 illustrates an example system, consistent with someimplementations of the present concepts.

FIG. 11 illustrates an example method or technique for orchestrating anapplication, consistent with some implementations of the presentconcepts.

FIG. 12 illustrates an example method or technique for scheduling anapplication, consistent with some implementations of the presentconcepts.

DETAILED DESCRIPTION Overview

As noted, many software applications strive to achieve low latency. Forinstance, a search application in the cloud might be expected to respondto user queries received over a network in a short amount of time, e.g.,500 milliseconds. As another example, a multi-user video game in thecloud might be expected to process controller inputs within a shortperiod of time, e.g., 10 milliseconds.

However, writing low-latency code can be difficult. Generally, softwaredevelopers find it easier to write modularized code that isself-contained, e.g., lacks complex data dependencies on other codemodules. One approach that allows developers to write modularized codeinvolves writing and deploying code modules as independent services thatexecute in separate processes, potentially on separate machines. Writingcode using a service-based approach with code modules that lackdependencies on one another is a very scalable and agile approach.However, when the services are deployed on different machines, networkand serialization costs can impact latency. Even if the services aredeployed in different processes on the same machine, serialization costsstill can affect latency. Moreover, while network costs have trendeddownward, serialization costs are relatively stable.

One approach to developing low-latency software while enablingdevelopers to write independent services is to deploy all of theservices of a particular application into the same process on the samemachine. Because the services are on the same machine, there is nonetwork cost, and because the services are in the same process, there isno serialization cost. This approach works to a point, but there arelimitations on how many services can be deployed on a single machine.

For example, in some cases, an application simply grows too large for asingle machine - the services for that application use more resourcesthan the machine can physically provide. In other cases, the machinetheoretically has sufficient resources to accommodate the entireapplication, but at runtime the services compete for resources in amanner that causes inefficient resource usage. In some cases, theresource competition on a given machine can have a higher cost than thenetwork/serialization costs that would be incurred by moving one ofthose services to another machine to alleviate the resource competition.

One approach to addressing the above-mentioned issues is to manuallyorchestrate and schedule services. Orchestration refers to determiningwhere code runs, e.g., whether individual services are deployed on thesame machine or in different machines, in the same process or inseparate processes, etc. Scheduling refers to determining when coderuns, e.g., whether services can be run in anticipation and/or run inparallel, as well as runtime prioritization of one service relative toanother, etc.

Developers generally have insights into their services that can behelpful for scheduling and orchestration purposes. However, it would bepreferable to allow developers to focus on writing individual serviceswithout having to spend time making orchestration and schedulingdecisions, as this may require contemplating the complex interactionbetween services. Moreover, developers ideally want to be able to deploytheir code autonomously, by shipping services when they are ready,rather than having to spend time integrating their code with otherservices for optimization of orchestration and scheduling. Whendevelopers write code with orchestration and performance concerns inmind, they tend to write code that is more difficult to understand andmaintain.

An alternative approach to manual scheduling and orchestration would beto perform automated analysis of source code in advance of schedulingand orchestration to infer resource usage patterns andschedule/orchestrate the services accordingly. However, it is not alwaysfeasible to anticipate ahead of time how code will act when deployed.For example, different use cases for software can result in differentresource usage patterns that may not be apparent via source inspection.

The disclosed implementations generally aim to address the abovedeficiencies of conventional approaches to deploying applicationservices. For example, the disclosed implementations provideorchestration mechanisms that can assign different services to differentprocesses and/or different machines in view of various criteria, such aslatency, reliability, and financial cost. The disclosed implementationsalso provide scheduling mechanisms that schedule individual services inview of runtime characteristics of those services. In some cases,dependency relationships between individual services can be used toinform orchestration and/or scheduling decisions.

Definitions

For the purposes of this document, the term “application process” refersto application code, memory allocated to execute the application code,and state associated with the application process. An applicationprocess can have one or more threads, each of which can share the samememory allocated to the process. For instance, in some cases, anapplication process can be allocated a designated virtual memory spaceby an operating system or hypervisor, and each thread in the applicationprocess can share that virtual memory space. The term “thread” refers toa sequence of instructions that can be scheduled independently atruntime.

At runtime, different services can be allocated one or more threadswithin a given process. When two or more services are concurrentlyexecuting in different threads, the services can be considered to run inparallel. In some cases, different threads can run on differentprocessors, and in other cases, a single processor can execute multiplethreads concurrently. The term “service” refers to a software modulethat can be deployed independently and interact with other services in amodular fashion. Generally speaking, services can have characteristicsof functional code, i.e., services tend to lack complex datadependencies on other services. Because individual services lack complexdata dependencies on other services, it is plausible to add, remove, orreplace individual services within a given application withoutintroducing logical errors or causing the application to crash.

The term “dependency graph” refers to a graph that represents data asnodes and services as edges. The term “edge” identifies both a serviceand a place in a dependency graph where the service can be run withoutviolating any dependencies between that service and other services.Generally, dependency graphs can be obtained by static or dynamicanalysis of an application. One type of a dependency graph is anexecution graph, which represents dependencies over a single executionof an application.

Dependency graphs can be used to ensure that the inputs to a service donot change after the service begins processing. When this is true, thenservices can have the characteristics of functional code describedabove, which provides significant flexibility for orchestration andscheduling purposes. If an input to a service changes at runtime afterthat service begins processing, this can create a cycle in thedependency graph. Thus, one broad approach for orchestration andscheduling involves representing relationships between services as adirected acyclic graph, and then orchestrating and scheduling thoseservices consistently with the dependencies conveyed by the directedacyclic graph.

Example Orchestration Processing Flow

As noted previously, orchestration generally refers to determining wherecode will run, e.g., by assigning individual services to run on aspecific process and/or on a specific machine. FIG. 1 illustrates anexample orchestration processing flow 100, consistent with the disclosedimplementations. In the orchestration processing flow, an orchestrator102 generates orchestration outputs 104 based on various inputs. Forinstance, the orchestration outputs can include process assignments thatassign individual services to individual processes. In some cases, theorchestration outputs also designate a particular computer or cluster ofcomputers to run respective instances of the processes.

To determine the orchestration outputs 104, the orchestrator 102 canconsider various sources of input data. For instance, the orchestratorcan consider expected runtime characteristics 106, actual runtimecharacteristics 108, and/or dependency information 110. Details on howthe orchestrator can use these inputs to determine the orchestrationoutputs are described below with reference to FIGS. 2-7 .

One way to obtain expected runtime characteristics 106 for a service isvia a developer hint. Generally, a developer hint can includeinformation about a given service provided by a developer of thatservice, e.g., an expected resource usage characteristic and/or runtimeduration of that service. In some implementations, a developer hint canbe provided up-front by a developer before a given service is ever run.For instance, a service may have one or more associated attributesprovided by the developer at development time. Thus, developer hints canbe used for orchestration purposes prior to actually running a givenservice. More generally, the term “expected runtime characteristics”encompasses any type of information, aside from information derived fromexecution logs, that conveys information about how a service is expectedto behave at runtime.

Actual runtime characteristics 108 can be derived by employing runtimecharacteristics analysis 112 on one or more execution logs 114.Generally, the execution logs can reflect previous executions of a givenapplication. The actual runtime characteristics derived from theexecution logs can reflect information about resource usage, runtimeduration (latency), and/or network and serialization costs of individualservices that is collected over time. In some cases, any of the actualruntime characteristics provided in the execution logs can varysignificantly over time, e.g., in different executions of theapplication.

Dependency information 110 can include information that reflects datadependencies between individual services of the application, e.g., inthe form of a dependency graph. Generally, the dependency informationcan be derived by employing code analysis 116 on source code forindividual services and/or by evaluating interactions between individualservices at runtime. For instance, source code can be analyzed byrecursively analyzing calls in the source code to determine whichservices invoke other services at runtime. Alternatively, or inaddition, runtime evaluation of executable code can be performed todetermine dependencies based on which services invoke other services atruntime.

Orchestration Example

FIG. 2 illustrates an example orchestration 200 of services. A computingcluster 202 runs an application process 204. In this example, an entireapplication is run within a single application process. Within theapplication process, various calls between services are represented as adependency graph with nodes 206(1) through 206(7) representing data andedges 208(1) through 208(9) representing services. Thus, the dependencygraph shown in FIG. 2 represents the application in a form thatimplicitly conveys dependency information about dependencies between aservice represented by edge 208(7) and other services represented byother edges in the application. As noted above, in some cases such adependency graph can be generated at compile time and in other cases atruntime. To ensure programmatic correctness, the edges of the dependencygraph can be orchestrated and scheduled to ensure that the dependenciesare honored.

In addition, note that that application process 204 can be duplicatedover one or more servers of computing cluster 202, e.g., each server canrun a separate instance of application process 204. For simplicity ofexposition, the following examples assume that each server in thecomputing cluster has identical hardware and software resources.However, the disclosed implementations can also be employed inheterogenous processing environments where a given computing cluster mayhave machines with different resource constraints.

FIG. 3 illustrates an example of a modified orchestration 300 of theapplication. Here, part of the application is moved to another computingcluster 302, which runs another application process 304. Specifically,edge 208(7) has been moved from application process 204 on computingcluster 202 to application process 304 on computing cluster 302. Atruntime, application process 204 can communicate node 206(4) toapplication process 304 using an inter-process communication mechanismsuch as a network transfer, for use as an input to edge 208(7).Application process 304 executes edge 208(7) on input node 206(4) toobtain output node 206(6), and then communicates output node 206(6) backto application process 204 for further processing via another networktransfer.

Generally, FIGS. 4-7 describe individual scenarios where orchestrator102 might modify the orchestration 200 shown in FIG. 2 to arrive atmodified orchestration 300 shown in FIG. 3 . Note that FIGS. 4-7illustrate relatively simple examples to convey certain orchestrationconsiderations in isolation. The following section entitled“ORCHESTRATION CONSIDERATIONS” provides more detail on how orchestrationcan be performed in more complex scenarios.

Orchestration Based on Resource Contention

FIG. 4 illustrates an example of how actual runtime characteristics 108obtained from execution logs 114 can be used with dependency information110 for orchestration purposes. Assume that the application hasprocessed three queries when orchestrated in a single applicationprocess as shown in FIG. 2 . FIG. 4 illustrates three executions,execution 402, execution 404, and execution 406, each represented by acorresponding execution graph. Each execution has a correspondinginstance of working memory utilization, 412, 414, and 416, obtained fromexecution logs 114.

Note than an “execution” can refer to processing an input to obtain anoutput, and each execution may involve running the same process ondifferent instances of input data. Thus, the representations shown inFIG. 4 show how much working memory was used by different edges indifferent executions. This data can be collected from a single instanceof a process running on a single machine or from multiple instances ofthe process running on different machines.

In this example, edge 208(7) exhibits significantly more working memoryutilization than any other edge. In a circumstance where the machines incomputing cluster 202 are close to running out of memory, it may beuseful to switch to the modified orchestration 300 by moving edge 208(7)over to computing cluster 302. By doing so, a significant amount ofmemory can be freed by moving only a single service from one process toanother.

Note, however, that this decision can be based on additional criteriasuch as the serialization cost and network cost of moving edge 208(7) inview of input and output dependencies shown in FIG. 4 . Specifically,moving edge 208(7) to computing cluster 302 will involve serializationand network costs for both nodes 206(4) and 206(6), because edge 208(7)has a data dependency on node 206(4) and edge 208(9) has a datadependency on node 206(6). In other words, the output data of edge208(3) is node 206(4), and this output data will be serialized oncomputing cluster 202 and sent over a network to computing cluster 302for processing by edge 208(7). This will take a certain amount of time.Likewise, the output data of edge 208(7) is node 206(6). This outputdata will be serialized on computing cluster 302 and sent over thenetwork to computing cluster 202 for processing by edge 208(9). In somecases, the latency cost imposed by the network and serialization of thisdata can exceed the benefit obtained by freeing up memory. The“ORCHESTRATION CONSIDERATIONS” set forth below describes variousapproaches that can be used to consider network and serialization costsof candidate orchestrations versus the potential performance benefits oforchestrating services on separate machines/processes.

As another example of how actual runtime characteristics can informorchestration decisions, assume that each edge each use 8 threads, andeach physical machine in computing cluster 202 provides a total of 24threads. Thus, no more than three edges can be run concurrently withoutrunning out of threads. Generally, the disclosed implementations candetermine whether the serialization and network costs of moving one ormore edges to another process will result in lower average latency thankeeping those edges in the same process and accepting the cost of threadcontention.

Considering the dependencies conveyed in FIG. 4 , note that edges208(4), 208(5), 208(6), and 208(7) have no data dependencies on oneanother. Thus, it is plausible that these edges can be run in parallelon a given machine. However, collectively, these edges utilize a totalof 32 threads. Thus, some implementations may choose of these edges tomove to computing cluster 302. For instance, one approach is to choosethe service with the lowest serialization and network cost and comparethose costs to the expected latency impact of thread contention.Assuming edge 208(7) has the lowest serialization and network cost ofthese four edges and that the serialization and network cost is belowthe thread contention cost (e.g., latency hit) of keeping the edge onthe same computing cluster as the other edges, then edge 208(7) can bemoved to computing cluster 302 as shown in FIG. 3 .

In other implementations, edges can be orchestrated based on other typesof resource utilization, such as network usage, storage throughput,and/or storage capacity utilization. In the case of network utilization,some services may utilize a network resource such as a network card veryheavily and cause other services to block or sleep while waiting toaccess the network resource. When this occurs, latency can potentiallybe reduced by moving that service (or another service) to anotherprocess/machine, subject to consideration of the network andserialization costs of doing so. Storage throughput is similar, e.g., ifone service exhibits heavy utilization of the storage bus, this can slowdown other services that also access storage and orchestrating one ormore services to another machine may result in a net latency improvementeven after the network and serialization costs are considered. Storagecapacity utilization can be viewed as a more static consideration, e.g.,whenever storage starts to become full, one or more services can beselected to orchestrate on a different machine.

Orchestration Based on Execution Frequency

FIG. 5 illustrates another example of how actual runtime characteristicsobtained from the execution logs can be used for orchestration purposes.FIG. 5 illustrates three executions, execution 502, execution 504, andexecution 506. In this example, the execution graphs of the individualexecutions are not all identical - the execution graph of execution 502invokes all edges of the application, but the second two executions 504and 506 have execution graphs that do not invoke edge 208(3) and edge208(7). FIG. 5 illustrates the concept that some services may run lessfrequently than other services. In general, it can be useful toorchestrate edges that run less frequently into separate processes,particularly for services that utilize resources even when they are notexecuted. Assume in this example that edge 208(3) has an extremely highserialization cost, whereas edge 208(7) does not. Thus, in thisparticular case, it can be useful to orchestrate edge 208(7) intoapplication process 304 on computing cluster 302, while retaining edge208(3) in application process 204 on computing cluster 202.

Note that services that run less frequently may be placed into clusterswith relatively fewer computers. For instance, if there are 10,000computers in a main cluster and a given service only runs on a smallfraction of the executions of the application, that service might be putinto a separate process that is executed on a much smaller cluster,e.g.,200 machines. Note also that execution frequency can change overtime, e.g., if edge 208(7) starts to become invoked more frequently inthe future, edge 208(7) could be moved back to application process 204on computing cluster 202, potentially while moving another edge over toapplication process 304 on computing cluster 302.

Orchestration Based on Statistical Critical Path

FIG. 6 illustrates another example of how actual runtime characteristicsobtained from the execution logs can be used with dependency informationfor orchestration purposes. FIG. 6 illustrates three executions,execution 602, execution 604, and execution 606. Each execution hascorresponding edge runtimes 612, 614, and 616, obtained from executionlogs 114.

Generally, the critical path through an application is the path thatdetermines the overall latency for that application. In someapplications, the critical path may always go through the same edges andnodes, but this is not always the case. In FIG. 6 , the critical pathfor each individual execution is shown in bold. Note that the criticalpath for execution 504 is different than the critical path forexecutions 602 and 606.

Generally, it can be useful to keep edges in the critical path in thesame process so that those edges do not incur serialization/networkcosts. Here, note that edges 208(3) and 208(7) are never in the criticalpath. Thus, these edges could be good candidates for orchestration in aseparate process. Assume again that edge 208(3) has an extremely highserialization cost, whereas edge 208(7) does not. Thus, in thisparticular case, it can be useful to orchestrate edge 208(7) intoapplication process 304 while retaining edge 208(3) in applicationprocess 204.

In some cases, it may be necessary or useful to orchestrate edges thatsometimes appear in the critical path into another process. Generally,the less often an edge appears in the critical path, the more likelythat overall performance can be maintained even if that edge is moved toanother process. Thus, consider edge 208(6), for instance. This edge isin the critical path once. Thus, all other things being equal, it islikely preferable to orchestrate edge 208(6) into a separate process ormachine rather than edges 208(1), 208(4), and 208(8), which appear inthe critical path more frequently.

While FIG. 6 illustrates three example executions, in practice, theremay be a very large number of executions stored in the execution logs.Statistical analysis may be performed over the execution logs todetermine the “statistical critical path,” e.g., a path that is thecritical path in more executions of the application than any other path.Another approach to identifying the statistical critical path is toconsider only those executions where an SLA-specified latency isexceeded, and identify the most-frequently occurring critical path inthose executions. Generally, it can be useful to ensure that edges onthe statistical critical path all run in the same process/machine whenpossible.

In some cases, the remaining edges that are not on the statisticalcritical path can be evaluated using a distance metric. The distancefrom the critical path can be defined for a given edge as the amount ofadditional time that edge would have had to run in a given executionbefore appearing on the critical path. Thus, for instance, considerexecution 602. The critical path is 150 milliseconds, i.e., 50milliseconds each for edges 208(1), 208(4), and 208(8). Edge 206(2) took20 milliseconds, and in this execution was in a path through edges208(5) and 208(8) that took 80 milliseconds total. Thus, edge 208(2) hasa distance of 70 milliseconds from the critical path in this instance.Edge 208(7), on the other hand, is only part of a single path that took30 milliseconds, and thus has a distance of 120 milliseconds from thecritical path.

One approach for orchestrating edges that are not on the statisticalcritical path is to sort those edges by average distance from thestatistical critical path over multiple executions. Generally, edgesthat are, on average, closer to the statistical critical path can bepreferentially placed in a given process/machine with the edges on thestatistical critical path. Likewise, edges further away from thestatistical critical path can be preferentially selected fororchestration in a separate process or machine that includes only edgesthat are not on the statistical critical path. As discussed more below,distance from the statistical critical path can be used in conjunctionwith other information, such resource utilization, network costs, andserialization costs to orchestrate edges in a manner that tends toreduce or potentially minimize average latency for the application.

Orchestration Based on Expected Runtime Characteristics

FIG. 7 illustrates another example that may implicate an orchestrationchange. Here, an existing version of a service corresponding to edge208(7) is replaced by a new version of the service, e.g., a new service702. The new service includes a developer hint that it is expected touse a total of 100 milliseconds of CPU time over one or more threads. Ifthis is likely to cause sufficient CPU contention to exceed thenetwork/serialization costs of moving the new service module to anotherprocess, the new edge can be orchestrated into a separate process, forreasons discussed above with respect to FIG. 4 .

For instance, assume that an older version of edge 208(7) used 50milliseconds of total CPU time, and each of edges 208(4), 208(5), and208(6) used 50 milliseconds of total CPU time each. Before thedeployment of new service 702, the aggregate CPU time of all of theseedges may have been low enough that keeping the edges orchestrated inthe same process was preferable to moving one or more edges to adifferent process. However, since the developer hint conveys that thenew service is expected to use 100 milliseconds of total CPU time, theorchestrator can anticipate that CPU contention costs may exceed theadditional network and serialization costs of moving one or more edgesto a different process. For instance, the orchestrator could proactivelyorchestrate the new service module into application process 304 oncomputing cluster 302. Without the developer hint, the orchestratormight have initially orchestrated the new service module in applicationprocess 204 and potentially introduced latency issues into theapplication that would not have been recognized until the next analysisof the execution logs.

Note that developer hints can convey other information, such asanticipated memory usage, storage capacity utilization, storagethroughput, network usage, runtimes, serialization costs, etc. In eachcase, the developer hint can be used as an approximation of actualruntime utilization of these resources for orchestration purposes, asdescribed elsewhere herein. After a new edge has executed for some time,the developer hint may be ignored in favor of using actual runtimecharacteristics for that edge as obtained from execution logs 114.

Also, note that the developer hint may give information about theexpected execution time for that edge. In that case, the expectedexecution time for the new edge may be substituted for the averageexecution time of the edge being replaced for the purposes of criticalpath analysis as described elsewhere herein. For instance, assume that agiven edge has a distance of 10 milliseconds from the statisticalcritical path. If a new version of that edge is received with adeveloper hint indicating that the new version will average 60milliseconds of execution time, then this may result in a newstatistical critical path that includes that new edge, and theapplication may be orchestrated accordingly.

Static vs. Dynamic Orchestration

Generally, the disclosed orchestration techniques can be implementedstatically at compile time, or dynamically during run time. Forinstance, a static approach to moving edge 208(7) is to stop applicationprocess 204, compile and link a new executable that lacks edge 208(7),and run this new executable as a new instance of application process204. Such a static approach could also involve compiling edge 208(7)into a separate executable and running the separate executable inapplication process 304.

A dynamic approach could involve moving edge 208(7) at runtime. In thisapproach, application process 204 continues running without beinghalted, and application process 304 is constructed as described abovefor static orchestration. In still further implementations (not shown),edge 208(7) can be moved into an existing process that is alreadyrunning one or more other services without halting the existing process.

Orchestrating an edge from one running process to another runningprocess is generally possible due to the modularized, functionalcharacteristics of services. Because of these characteristics,functional services do not necessarily need to be run in the orderdefined by the developer in the source code. In other words, theservices can be constructed so that, as long as the dependencyconstraints conveyed by the dependency information are honored, theservices can be moved from process to process and/or run in parallel, inanticipation, and independently prioritized as discussed below withrespect to scheduling. In a case where an application process ishandling multiple executions concurrently (e.g., multiple queries) and anew edge is inserted into the application process, the applicationprocess can handle existing executions that have already started withthe old version of the edge and handle new executions with the newversion of the edge. Once the existing executions have completed, theold version of the edge can be removed from the application process.

Orchestration Considerations

FIGS. 3-7 illustrate a few specific examples of circumstances underwhich it could be useful to adjust how the application is orchestrated,e.g., by moving edge 208(7) from computing cluster 202 to computingcluster 302. The following description gives some additional details onorchestration approaches. Specifically, the following descriptionrelates to how runtime characteristics of an application can be analyzedin view of dependency information for that application to arrive at arelatively efficient orchestration of the application. For instance, anorchestration can be selected based on one or more orchestrationobjectives. For instance, orchestration objectives can include reducingor minimizing latency, increasing or maximizing reliability, reducing orminimizing financial cost, etc. Note that latency, reliability and costare just a few examples of orchestration objectives that can beemployed.

Generally, orchestration of an application can occur when an applicationis first deployed, and then the orchestration can subsequently can bemodified over time. An application may start as a relatively smallcollection of services that perform a specific function, such as asearch engine that responds to user queries. Over time, developers mayrefine the application by adding new features via new services, and/orreplacing individual services with updated versions, such as a newranking algorithm for search results.

Each time a new service is deployed, the orchestrator 102 can be run todetermine a new orchestration for the application. In addition, theorchestrator can run periodically or constantly to evaluate how theactual runtime characteristics of individual services change over time.Thus, orchestration can continually be modified as an applicationexecutes over time and is modified by new services, and/or as runtimecharacteristics of the individual services are detected over time.Generally, modifying an orchestration can involve selecting a particularservice to move from one process to another process, potentially ondifferent machines.

One heuristic approach for orchestration is to start by orchestratingall services into a single process. This is generally a high-performanceapproach because there is no serialization or network overhead ascommunications between services can happen within memory allocated tothe application process. Once the machine reaches a point where resourcecontention becomes an issue, alternative orchestrations can beconsidered.

Generally, there are several sources of information that can be used todetect potential resource contention. As noted, the execution logs 114can convey information for each edge over many executions of theapplication, such as the amount of runtime memory, processorutilization, network utilization, number of threads, storage capacityutilization, and/or storage throughput. Alternatively, or in addition,this information can be provided at deployment time via expected runtimecharacteristics 106 in the form of a developer hint or otherrepresentation of how a given service is expected to act at runtime.

Memory generally becomes constrained when the total amount of memoryused by the services in the process gets close to exceeding the physicalmemory available on the machine. Memory can be utilized by executablecode, permanent data used by a service (e.g., a dictionary), and alsoruntime memory that varies during execution (e.g., stack or heapmemory). The processor generally becomes constrained when one service isready to run but is waiting for another service, e.g., because nothreads are available. Another example of a resource conflict that canbe considered is a cache conflict, where a first service may run slowerbecause the first service has to retrieve data from memory when a secondservice keeps evicting the first service’s data from a processor cache.Network and storage throughput generally become constrained when thebandwidth of the network or storage bus is exceeded by the collectivedemand imposed by the services within a process. Storage capacitygenerally becomes constrained when storage drive(s) for a given machinebegin to fill up.

Once resource contention is detected, one or more edges can be selectedto move to another process. A rules-based or heuristic approach cangenerally select the edge or edges based on predetermined criteria, someof which are discussed above relative to FIGS. 4-7 . For instance, it isgenerally useful to move edges that have relatively low serializationand network costs, as these factors can contribute greatly to latency.As another criteria, it is generally useful to preferentially move edgesthat run less frequently instead of edges that run more frequently. Asanother criteria, it is generally useful to preferentially move edgesthat are not in the statistical critical path, and that are less likelythan other edges to occur in the critical path for a given execution.

Taking the criteria specified above into account, consider a scenariowhere the application has a service level agreement to respond to 99.9%of queries within 500 milliseconds. The orchestrator 102 can beprogrammed with a rule that states: when memory utilization exceeds 80%,select the edge that uses the most memory out of the edges that are notin the statistical critical path and move that edge to another process.This can be implemented using the static or dynamic approaches discussedabove. A more refined rule might state: when memory utilization exceeds80%, select the edge that uses the most memory out of the edges that areat least 100 milliseconds from the statistical critical path and havenetwork and serialization costs below 50 milliseconds, and move thatedge to another process.

The above rules could be modified similarly for thread utilization,e.g., when the number of threads in a given layer of the dependencygraph for an application reaches 95% of the total number of threads,identify the edges that are not in the statistical critical path, sortby the number of threads those edges use, and move the edge that usesthe most threads. A more refined rule might state: when the number ofthreads in a given layer of the execution tree reaches 95% of the totalnumber of threads, identify the edges that are at least 100 millisecondsfrom the statistical critical path and have network and serializationcosts below 50 milliseconds, sort those edges by the number of threadsused, and move the edge that uses the most threads.

The above examples illustrate how information such as resourceutilization, network and serialization costs, and dependency informationcan be used to generate rules for orchestrating code. However, in somecases, applications can be very complex with many different services,some of which run on each execution and some of which rarely run.Individual services can range from consistently using the same amount ofresources to varying wildly in the amount of resources used at runtime.Likewise, individual services can consistently execute in approximatelythe same amount of time or can vary wildly in execution times. Thesefactors can be constantly changing in view of how workload changes overtime as well as when new services are added to the application.

In addition to the factors outlined above, there may be a very largenumber of potential orchestrations that are consistent with thedependencies of a given application. Thus, it may be implausible for aheuristic approach to consider all of the potential orchestrations. Inaddition, there may be instances where optimal or near-optimalorchestrations are not intuitive. For instance, referring back to FIG. 4, a relatively straightforward approach is to move edge 208(7) toanother process when memory becomes constrained. However, there may becircumstances when average latency is lower if edge 208(7) is retainedand other edges are moved to another process to free memory, and it canbe difficult to design a heuristic approach to capture thesecircumstances.

As an alternative to the heuristic orchestration approaches outlinedabove, another orchestration approach involves the use of a linearsolver or machine-learned model to orchestrate code. Generally, a solveror machine-learned model can use dependency information that indicatesordering constraints for individual services. The dependency informationand/or execution logs can also indicate what data is communicatedbetween different services, e.g., the inputs and outputs of eachservice. By evaluating the size of the data communicated betweenservices, the solver or machine-learned model can infer theserialization and network costs of moving one or more services toanother machine.

The solver or machine learning model can orchestrate the code with agiven orchestration objective, e.g., to achieve the lowest possiblelatency given a fixed amount of resources. The solver or machinelearning model can consider potential orchestrations of edges ondifferent processes/machines and output a final orchestration given thisobjective. The solver or machine learning model can also haveconstraints on the potential orchestrations to avoid exceeding resourcesavailable on each machine, e.g., to avoid running out of memory, storagecapacity, available threads, etc.

In some implementations, the solver or machine learning model can beprogrammed to meet a service level agreement (SLA) with a reduced and/orlowest possible financial cost, e.g., using an objective functiondefined over cost, latency, and/or the reliability required by the SLA.Generally, a solver can consider most or all of the potentialorchestrations for an application, evaluate each of the potentialorchestrations according to these criteria, and select an optimal ornear-optimal orchestration.

In the case of a machine learning model, the previous execution logs canbe used as training data. Executions that met the SLA-defined latencycan be labeled as positive training examples, and executions that tooklonger than the SLA-defined latency can be labeled as negative examples.For instance, a neural network could be trained using these trainingexamples. As another example, a reinforcement learning approach could beadopted with a reward function that rewards certain orchestrationobjectives. In a machine-learning context, the inputs to theorchestrator 102 shown in FIG. 1 can provide features for learning,e.g., the expected runtime characteristics, actual runtimecharacteristics, and/or dependency information can be used as features.

Over time, as more execution logs are collected, new edges arrive withexpected runtime characteristics, and/or dependency information for agiven application changes, the solver or machine learning model canadjust how the application is orchestrated. In the case of dynamicorchestration, the orchestrator 102 can continually modify theorchestration of the application by preferentially moving selected edgesto different processes to meet the SLA, even while developers continueto independently develop and deploy new services and the applicationcontinues running without interruption. As a consequence, not only doesthe application perform well, but developers can spend their timefocused on writing quality services rather than concerning themselveswith orchestration concerns.

Orchestration With Dynamic Depencencies

In some implementations, the dependency information for an applicationcan change in mid-execution. For instance, consider an edge that is partof a loop that always runs at least three times, and potentially couldrun an infinite number of times. At compile time, it may be possible toconstruct a partial dependency graph for the application with threeedges for each of the known loop iterations. At runtime, the partialdependency graph can be completed once the final number of loopiterations is known, e.g., based on a value output by an edge closer tothe root node.

In such a case, orchestration can include re-routing services inmid-execution to adapt the application in view of the detected runtimechange to the dependency information. For instance, assume that eachiteration of the loop uses 1 gigabyte of memory, and that the partialdependency graph includes services that leave 50 gigabytes of memoryavailable. Further assume that, at runtime, a given execution results ina runtime value of a loop counter that will cause the loop to beexecuted for 100 loop iterations. It follows that, to avoid running outof memory, at least 50 gigabytes of memory need to be freed up. Theorchestration approaches discussed in the previous section can beemployed in mid-execution to determine that those iterations areperformed by instances of those services that are available in anotherprocess. In other words, the orchestrator 102 can select, out of theedges that have not yet been run for that execution, which edge or edgesshould be executed by another process. In some cases, the orchestratormight select the edges that execute the loop iteration in question.However, this is not necessarily the case, as the orchestrator mayinstead choose a different edge, e.g., if the serialization/networkcosts of moving the loop iterations exceeds the serialization/networkcosts of moving one or more other edges that collectively use 50gigabytes or more of memory, then the orchestrator may select the otheredges to reroute to a different process/machine.

Example Scheduling Processing Flow

As noted previously, scheduling generally refers to determining whencode runs, e.g., whether services can be run in anticipation, run inparallel, runtime prioritization of threads for one service relative toanother, etc. FIG. 8 illustrates an example scheduler processing flow800, consistent with the disclosed implementations. In the schedulingprocessing flow, scheduler 802 generates scheduling outputs 804 based onvarious inputs. For instance, the scheduling outputs can determine whenindividual edges are scheduled to run and can also determine therespective scheduling priorities for the thread or threads used to runeach edge.

To determine the scheduling outputs 804, the scheduler 802 can considervarious sources of input data. For instance, the scheduler can considerexpected runtime characteristics 106, actual runtime characteristics108, dependency information 110, and/or orchestration outputs 104 asdiscussed previously. The orchestration outputs generally indicate whichservices will be run together in a given process, e.g., the scheduleroperates under the constraint that each service executes in anapplication process that has been assigned by the orchestrator 102.

The scheduler 802 also operates under the constraint that services arerun consistently with the dependency information. For instance, a givenedge cannot be run until the input data for that edge is available.However, once the input data for a given edge is available, that edgecan be scheduled at any time. Generally, this is plausible due to thepreviously noted characteristics of services, e.g., they lack complexdata dependencies on one another and thus can be scheduled in a flexiblemanner. This is in contrast to conventional imperative programming,where the source code itself conveys the order in which operations areperformed and the developer is responsible for ensuring that complexdata dependencies between code modules are handled correctly.

One approach for scheduling code involves generally prioritizingspecific services that are closest to the root node of a dependencygraph. For instance, the scheduler 802 can sort the edges of adependency graph by proximity to the root node. For each edge in a givenlayer, the scheduler can assign the same scheduling priority to eachthread that runs that edge, with edges closer to the leaf nodes havinglower scheduling priorities. The scheduler can also attempt to run edgescloser to the root node sooner if possible, by selected those edges overother edges closer to the leaf nodes to be run in anticipation and/or inparallel provided the input data is available for a given edge. In somecases, the scheduler can even optimistically execute edges that do notnecessarily run in every execution of the application. Those optimisticexecutions can be canceled when a final determination is made that thoseservices will not need be needed, to free up any resources utilized bythose optimistic executions.

As noted previously, some implementations may use the execution logs 114to identify a statistical critical path. Generally, the scheduler 802can prioritize edges along the statistical critical path higher thanedges that are not on the statistical critical path. In other words, thescheduler can preferentially schedule edges on the statistical criticalpath to run earlier than other edges and/or assign relatively higherscheduling priorities to threads allocated to edges on the statisticalcritical path than to threads allocated to other edges. Likewise, thescheduler can prioritize the other edges based on their relativedistance from the critical path in a similar manner. In some cases, thescheduler can calculate the statistical critical path independently fromthe inputs shown in FIG. 8 . In other implementations, the scheduler canobtain the statistical critical path from the orchestrator 102.

Scheduling Example

FIG. 9 illustrates a scheduling example that conveys certain schedulingconcepts described above. Assume that for a first execution 902, thescheduler assigns threads for each edge with a corresponding schedulingpriority that is based on the distance of that edge from the root node.FIG. 9 includes initial scheduling priorities 904 assigned by thescheduler. Note that the edges in each layer share the same schedulingpriorities for their respective threads, and that edges closer to theroot node have higher scheduling priorities. This can provide areasonable first-order scheduling mechanism because latency cantypically be reduced if edges closer to the root node are prioritizedover edges later in the dependency graph. This is because, as a generalrule, edges closer to the root node have more edges that depend directlyor indirectly on the output of that edge.

However, the above approach can generally be extended by consideringwhich edges are actually likely to be in the statistical critical path.Refer back to the example shown in FIG. 6 , where the critical path isshown in bold for three different executions of the application. In thiscase, edges 208(1), 208(4), and 208(8) appear in the critical path ⅔ ofthe time, and thus are in the statistical critical path. Edges 208(2),208(6), and 208(9) appear in the critical path once. One potentialscheduling approach is to prioritize scheduling of edges first by layerand then by likelihood of appearing in the critical path. This approachis shown for execution 906 in FIG. 9 via priorities 908. Within eachlayer, edges that appear more frequently in the critical path havehigher scheduling priorities than other edges in those layers.

Another approach is to schedule edges first by layer and then by averagedistance from the statistical critical path. This is a similar approachthat, in some cases, yields different results. Consider a first edgethat appears three times in the critical path over 10,000 executions.For instance, the first edge might have an average duration of 10milliseconds but may have taken much longer, e.g., 100 milliseconds, onthe three executions in which the first edge appeared in the criticalpath. Consider a second edge in the same layer as the first edge thathas an average duration of 15 milliseconds but does not once appear inthe critical path over 10,000 executions. In an implementation whereedges are prioritized by distance from the statistical critical path, itis plausible that the second edge may be prioritized higher than thefirst edge. In an implementation where edges are prioritized based onhow frequently the edges appear in the critical path over multipleexecutions, the first edge will be prioritized over the second edge.

In addition, note that the previous example assumes that each edge in agiven layer has a lower priority than any edge in a layer that is closerto the root. This is one plausible implementation but is not intended toimply any limitation. For instance, referring back to FIG. 6 , otherimplementations might give edge 208(8) a higher priority than edge208(3), as edge 208(3) never appears in the critical path over the threeillustrated executions.

In addition, note that the description above with respect to FIG. 9 usedscheduling priorities for an example of prioritized scheduling. However,a similar approach can be adopted for determining when to schedule agiven edge. For instance, assuming that input data is available formultiple different edges, the scheduler can preferentially schedule anyof those edges that are on the statistical critical path to run beforescheduling other edges that are not on the statistical critical path.Likewise, of the remaining edges for which input data is ready and arenot on the statistical critical path, the scheduler can preferentiallyschedule those edges in order based on either the frequency within whichthose edges have appeared in the critical path for individualexecutions, and/or the respective distances of those edges from thestatistical critical path.

Furthermore, scheduling can also consider expected runtimecharacteristics 106 and/or actual runtime characteristics 108. Forinstance, as noted above, the expected runtime characteristics for a newedge can be used to calculate a new statistical critical path for agiven application. In addition, scheduler 802 can generally try to avoidscheduling edges to run concurrently on the same machine when theresource utilization characteristics of those edges are likely to causea resource conflict as discussed above with respect to orchestration.

In some cases, the scheduler 802 might adjust scheduling of a given edgewhen a new service is received with a developer hint that indicates thatservice will have a resource conflict that was not present with aprevious version of that service. In other cases, the scheduler mightdetect resource contention after running a new service for a certainamount of time and adjust how that service is scheduled accordingly. Asnoted previously, services can generally be scheduled whenever inputdata is available provided the dependencies for the application arehonored. Thus, if the scheduler determines that running a particulargroup of edges together will likely create a resource conflict, thescheduler can preferentially run one or more of those edges when inputdata becomes available over other edges that may not be involved in aresource conflict. For instance, the scheduler may preferentiallyschedule edges so that they complete and cease utilizing a givenresource before other edges that use that same resource heavily arescheduled to run.

Example System

The present implementations can be performed in various scenarios onvarious devices. FIG. 10 shows an example system 1000 in which thepresent implementations can be employed, as discussed more below. Asshown in FIG. 10 , system 1000 includes a client device 1010, server1020, computing cluster 202, and computing cluster 302 connected by oneor more network(s) 1050. Note that the client devices can be embodiedboth as mobile devices such as smart phones and tablets, as well asstationary devices such as desktops. Likewise, the servers and/orclusters can be implemented using various types of computing devices. Insome cases, any of the devices shown in FIG. 10 , but particularlyserver 1020 and computing clusters 202 and 203, can be implemented indata centers, server farms, etc.

Certain components of the devices shown in FIG. 10 may be referred toherein by parenthetical reference numbers. For the purposes of thefollowing description, the parenthetical (1) indicates an occurrence ofa given component on client device 1010, (2) indicates an occurrence ofa given component on server 1020, (3) indicates an occurrence oncomputing cluster 202, and (4) indicates an occurrence on computingcluster 302. Unless identifying a specific instance of a givencomponent, this document will refer generally to the components withoutthe parenthetical.

Generally, the devices shown in FIG. 10 may have respective processingresources 1001 and storage resources 1002, which are discussed in moredetail below. The devices may also have various modules that functionusing the processing and storage resources to perform the techniquesdiscussed herein. For example, client device 1010 can include a clientapplication 1011 that can interact with either application process 204on computing cluster 202 and/or application process 304 on computingcluster 302. For instance, the client device can submit queries to theapplication processes, and receive responses from the applicationprocesses, over network 1050.

Orchestrator 102 can perform orchestration processing flow 100 as shownin FIG. 1 . Scheduler 802 can perform scheduler processing flow 802 asshown in FIG. 8 . The respective orchestration and scheduling outputscan be provided to runtime 1030(1) on computing cluster 202 and runtime1030(2) on computing cluster 302. The runtime can be responsible forswapping in new services as they are received and generating executionlogs 114 and providing the execution logs to the orchestrator and/orscheduler.

Example Orchestration Method

FIG. 11 illustrates an example orchestration method 1100 that can beused to orchestrate services of an application into one or moreprocesses on one or more machines, consistent with the present concepts.Method 1100 can be implemented on many different types of devices, e.g.,by one or more cloud servers, by a client device such as a laptop,tablet, or smartphone, or by combinations of one or more servers, clientdevices, etc. In some implementations, method 1100 is performed byorchestrator 102.

Method 1100 begins at block 1102, where dependency information for anapplication is obtained. As noted previously, the dependency informationcan be in the form of a dependency graph that conveys data dependenciesbetween individual services of the application. In some cases, thedependency information is generated at compile time and does not changeat runtime. In other cases, dependency information is generated entirelyat runtime. In still other cases, initial dependency information isgenerated at compile time and then modified at runtime.

Method 1100 continues at block 1104, where runtime characteristics ofthe individual services are identified. As previously noted, in somecases, the runtime characteristics are actual runtime values based onprevious executions of the applications. In other cases, the runtimecharacteristics are expected runtime characteristics provided by adeveloper.

Method 1100 continues at block 1106, where automated orchestration isperformed. For instance, block 1106 can involve performing orchestrationprocessing flow 100.

Example Scheduling Method

FIG. 12 illustrates an example scheduling method 1200 that can be usedto schedule services of an application, consistent with the presentconcepts. As discussed more below, method 1200 can be implemented onmany different types of devices, e.g., by one or more cloud servers, bya client device such as a laptop, tablet, or smartphone, or bycombinations of one or more servers, client devices, etc. In someimplementations, method 1200 is performed by orchestrator 102.

Method 1200 begins at block 1202, where execution logs are evaluated toidentify different critical paths. As previously noted, each time anapplication is executed, a corresponding critical path can bedetermined. As noted with respect to FIG. 6 , different executions of anapplication can result in different critical paths.

Method 1200 continues at block 1204, where a statistical critical pathis identified. Generally, the statistical critical path is a particularpath through the services of the application that tends to be thecritical path relatively frequently over multiple executions of theapplication. In some cases, the statistical critical path is thecritical path that appears most frequently out of the critical pathsidentified at block 1202.

Method 1200 continues at block 1206, where services are scheduled basedon whether the services occur on the statistical critical path. Forinstance, services on the statistical critical path may bepreferentially prioritized over services that are not on the statisticalcritical path. More generally, block 1206 can involve performingscheduler processing flow 800.

Device Implementations

As noted above with respect to FIG. 10 , system 1000 includes severaldevices, including a client device 1010, a server 1020, and individualservers in computing clusters 202 and 302. As also noted, not all deviceimplementations can be illustrated, and other device implementationsshould be apparent to the skilled artisan from the description above andbelow. For instance, in some implementations, orchestration and/orscheduling can be performed directly on computing device that executesan application process, rather than in a separate device as illustrated.

The term “device”, “computer,” “computing device,” “client device,” andor “server device” as used herein can mean any type of device that hassome amount of hardware processing capability and/or hardwarestorage/memory capability. Processing capability can be provided by oneor more hardware processors (e.g., hardware processing units/cores) thatcan execute computer-readable instructions to provide functionality.Computer-readable instructions and/or data can be stored on storageresources. The term “system” as used herein can refer to a singledevice, multiple devices, etc.

Storage resources can be internal or external to the respective deviceswith which they are associated. The storage resources can include anyone or more of volatile or non-volatile memory, hard drives, flashstorage devices, and/or optical storage devices (e.g., CDs, DVDs, etc.),among others. In some cases, the modules of system 1000 are provided asexecutable instructions that are stored on persistent storage devices,loaded into the random-access memory devices, and read from therandom-access memory by the processing resources for execution.

As used herein, the term “computer-readable media” can include signals.In contrast, the term “computer-readable storage media” excludessignals. Computer-readable storage media includes “computer-readablestorage devices.” Examples of computer-readable storage devices includevolatile storage media, such as RAM, and non-volatile storage media,such as hard drives, optical discs, and flash memory, among others.

In some cases, the devices are configured with a general-purposehardware processor and storage resources. In other cases, a device caninclude a system on a chip (SOC) type design. In SOC designimplementations, functionality provided by the device can be integratedon a single SOC or multiple coupled SOCs. One or more associatedprocessors can be configured to coordinate with shared resources, suchas memory, storage, etc., and/or one or more dedicated resources, suchas hardware blocks configured to perform certain specific functionality.Thus, the term “processor,” “hardware processor” or “hardware processingunit” as used herein can also refer to central processing units (CPUs),graphical processing units (GPUs), controllers, microcontrollers,processor cores, or other types of processing devices suitable forimplementation both in conventional computing architectures as well asSOC designs.

Alternatively, or in addition, the functionality described herein can beperformed, at least in part, by one or more hardware logic components.For example, and without limitation, illustrative types of hardwarelogic components that can be used include Field-programmable Gate Arrays(FPGAs), Application-specific Integrated Circuits (ASICs),Application-specific Standard Products (ASSPs), System-on-a-chip systems(SOCs), Complex Programmable Logic Devices (CPLDs), etc.

In some configurations, any of the modules/code discussed herein can beimplemented in software, hardware, and/or firmware. In any case, themodules/code can be provided during manufacture of the device or by anintermediary that prepares the device for sale to the end user. In otherinstances, the end user may install these modules/code later, such as bydownloading executable code and installing the executable code on thecorresponding device.

Also note that devices generally can have input and/or outputfunctionality. For example, computing devices can have various inputmechanisms such as keyboards, mice, touchpads, voice recognition,gesture recognition (e.g., using depth cameras such as stereoscopic ortime-of-flight camera systems, infrared camera systems, RGB camerasystems or using accelerometers/gyroscopes, facial recognition, etc.).Devices can also have various output mechanisms such as printers,monitors, etc.

Also note that the devices described herein can function in astand-alone or cooperative manner to implement the described techniques.For example, the methods and functionality described herein can beperformed on a single computing device and/or distributed acrossmultiple computing devices that communicate over network(s) 1050.Without limitation, network(s) 1050 can include one or more local areanetworks (LANs), wide area networks (WANs), the Internet, and the like.

In addition, some implementations may employ any of the disclosedtechniques in an Internet of Things (IoT) context. In suchimplementations, a home appliance or automobile might providecomputational resources that implement the modules of system 1000.

Additional Examples

Various device examples are described above. Additional examples aredescribed below. One example includes a method performed by a computingdevice, the method comprising obtaining dependency information for anapplication, the dependency information representing data dependenciesbetween a plurality of services of the application, identifying runtimecharacteristics of individual services, and based at least on thedependency information and the runtime characteristics, performingautomated orchestration of the individual services into one or moreapplication processes.

Another example can include any of the above and/or below examples wherethe identifying comprises deriving actual runtime characteristics of theindividual services from execution logs reflecting previous executionsof the application.

Another example can include any of the above and/or below examples wherefor each individual service, deriving, from the execution logs, at leastthe following actual runtime characteristics: memory utilization, threadutilization, network utilization, storage throughput, storage capacityutilization, serialization costs, runtime duration, and network costs.

Another example can include any of the above and/or below examples wherethe performing automated orchestration comprises orchestrating theplurality of services based at least one or more orchestrationobjectives selected from a group comprising latency, financial cost, andreliability.

Another example can include any of the above and/or below examples wherethe performing automated orchestration comprises selecting a particularservice to move from a first application process to a second applicationprocess based at least on how frequently the particular service isexecuted over multiple previous executions of the application.

Another example can include any of the above and/or below examples wherethe performing automated orchestration comprises selecting a particularservice to move from a first application process to a second applicationprocess based at least on whether the particular service appears in acritical path of the application over multiple previous executions ofthe application.

Another example can include any of the above and/or below examples wherethe automated orchestration is performed dynamically without halting theone or more application processes.

Another example can include any of the above and/or below examples wherethe identifying comprises receiving a developer hint that conveys anexpected runtime characteristic of a new version of a particularservice.

Another example can include any of the above and/or below examples wherethe performing automated orchestration comprises removing an existingversion of the particular service and performing automated orchestrationof the new version of the particular service into a selected processbased at least on the expected runtime characteristic conveyed by thedeveloper hint.

Another example can include a method performed on a computing device,the method comprising evaluating execution logs for an applicationhaving a plurality of services to identify different critical pathscorresponding to multiple executions of the application, identifying astatistical critical path for the application based at least onfrequency of occurrence of the different critical paths in the executionlogs, and scheduling individual services of the application based atleast on whether the individual services occur on the statisticalcritical path.

Another example can include any of the above and/or below examples wherethe scheduling comprises assigning scheduling priorities to threadsallocated to the individual services.

Another example can include any of the above and/or below examples wherewherein the scheduling comprises prioritizing specific services thatoccur on the statistical critical path above one or more other servicesthat do not occur on the statistical critical path.

Another example can include any of the above and/or below examples wherethe method further comprises determining respective distances of the oneor more other services from the statistical critical path andprioritizing the one or more other services based at least on therespective distances of the one or more other services from thestatistical critical path.

Another example can include any of the above and/or below examples wheredetermining the respective distances of the one or more other servicesfrom the statistical critical path comprises, based at least on previousexecution times of the one or more other services, determining therespective distances as respective amounts of time that the one or moreother services would have had to run before appearing in the statisticalcritical path.

Another example can include any of the above and/or below examples wherethe statistical critical path comprises a particular path through theapplication that, over the multiple executions, most frequentlydetermines latency of the application.

Another example can include a system comprising a first computingcluster configured to execute a first application process, a secondcomputing cluster configured to execute a second application process,and a computing device configured to execute an orchestrator configuredto: obtain dependency information reflecting dependencies between aplurality of services of an application, obtain runtime informationrepresenting runtime characteristics of individual services, and basedat least on the dependency information and the runtime characteristics,perform orchestration of the individual services into the firstapplication process on the first computing cluster and the secondapplication process on the second computing cluster.

Another example can include any of the above and/or below examples wherethe orchestrator comprises a solver or a machine-learned model.

Another example can include any of the above and/or below examples wherethe solver or the machine-learned model can be configured to perform theorchestration based at least on a latency objective for the application.

Another example can include any of the above and/or below examples wherethe orchestrator can be configured to detect a runtime change to thedependency information and modify the orchestration during execution ofthe application based at least on the runtime change to the dependencyinformation.

Another example can include any of the above and/or below examples wherethe runtime change comprising a change to a runtime value for a numberof loop iterations.

Conclusion

Although the subject matter has been described in language specific tostructural features and/or methodological acts, it is to be understoodthat the subject matter defined in the appended claims is notnecessarily limited to the specific features or acts described above.Rather, the specific features and acts described above are disclosed asexample forms of implementing the claims and other features and actsthat would be recognized by one skilled in the art are intended to bewithin the scope of the claims.

1-20. (canceled)
 21. A method performed on a computing device, themethod comprising: evaluating execution logs for an application having aplurality of services to identify different critical paths of theapplication during multiple previous executions of the application;identifying a statistical critical path for the application based atleast on frequency of occurrence of the different critical paths in theexecution logs, wherein the execution logs identify at least one othercritical path of the application other than the statistical criticalpath and the statistical critical path determines overall latency of theapplication more frequently than the at least one other critical path;and scheduling individual services of the plurality of services of theapplication based at least on whether the individual services occur onthe statistical critical path.
 22. The method of claim 21, wherein thescheduling comprises assigning scheduling priorities to threadsallocated to the individual services.
 23. The method of claim 22,wherein the scheduling comprises: prioritizing specific services thatoccur on the statistical critical path above one or more other servicesthat do not occur on the statistical critical path.
 24. The method ofclaim 23, further comprising: determining respective time distances ofthe one or more other services from the statistical critical path; andprioritizing the one or more other services based at least on therespective time distances of the one or more other services from thestatistical critical path.
 25. The method of claim 24, whereindetermining the respective time distances of the one or more otherservices from the statistical critical path comprises: based at least onprevious execution times of the one or more other services, determiningthe respective time distances as respective amounts of time that the oneor more other services would have had to run before appearing in thestatistical critical path.
 26. The method of claim 21, wherein thestatistical critical path comprises a particular path through theapplication that, over the multiple previous executions, most frequentlyis the critical path of the application.
 27. A system comprising: aprocessing unit; and a computer-readable storage medium storingcomputer-readable instructions which, when executed by the processingunit, cause the system to: identify different critical paths of anapplication during multiple previous executions of the application, thedifferent critical paths determining overall latency of the applicationduring different executions of the application; identifying, from thedifferent critical paths of the application, a statistical critical pathof the application that determines overall latency of the applicationmore frequently than other critical paths of the application; andscheduling individual services of the application based at least onwhether the individual services occur on the statistical critical path.28. The system of claim 27, wherein the computer-readable instructions,when executed by the processing unit, cause the system to: access adependency graph of the application; and assign scheduling priorities tothe individual services based at least on distances of the individualservices from a root node of the dependency graph.
 29. The system ofclaim 28, wherein the computer-readable instructions, when executed bythe processing unit, cause the system to: identify at least twodifferent services in a particular layer of the dependency graph; andassign a relatively higher scheduling priority to a particular servicein the particular layer that occurs on the statistical critical path anda relatively lower scheduling priority to another service in theparticular layer that does not occur on the statistical critical path.30. The system of claim 29, wherein the computer-readable instructions,when executed by the processing unit, cause the system to: assign higherpriorities to first services in a first layer that is relatively closerto the root node than the particular layer; and assign lower prioritiesto second services in a second layer that is relatively further from theroot node than the particular layer.
 31. The system of claim 27, whereinthe computer-readable instructions, when executed by the processingunit, cause the system to: identify at least two different services ofthe application for which input data is ready; and schedule a firstservice of the at least two different services that is on thestatistical critical path before a second service of the at least twodifferent services that is not on the statistical critical path.
 32. Thesystem of claim 31, wherein the computer-readable instructions, whenexecuted by the processing unit, cause the system to: schedule a thirdservice of the at least two different services before a fourth serviceof the at least two different services based at least on the thirdservice having occurred more frequently in the different critical pathsof the application than the fourth service.
 33. The system of claim 31,wherein the computer-readable instructions, when executed by theprocessing unit, cause the system to: schedule a third service of the atleast two different services before a fourth service of the at least twodifferent services based at least on the third service being relativelycloser to the statistical critical path of the application than thefourth service.
 34. The system of claim 27, wherein the individualservices are scheduled to run on at least two different computingclusters.
 35. A computer-readable storage media storing executableinstructions which, when executed by a processing unit, cause theprocessing unit to perform acts comprising: identifying a statisticalcritical path for an application, wherein the statistical critical pathdetermines overall latency of the application more frequently than atleast one other critical path of the application; and schedulingindividual services of the application based at least on whether theindividual services occur on the statistical critical path.
 36. Thecomputer-readable storage media of claim 35, wherein the scheduling theindividual services is further based at least on where the individualservices occur in a dependency graph of the application.
 37. Thecomputer-readable storage media of claim 36, wherein the scheduling theindividual services is further based at least on proximity of theindividual services to a root node of the dependency graph.
 38. Thecomputer-readable storage media of claim 37, wherein scheduling theindividual services comprises determining distances of the individualservices from the statistical critical path and scheduling theindividual services based at least on the distances.
 39. Thecomputer-readable storage media of claim 37, wherein scheduling theindividual services is based at least on an expected resource conflict.40. The computer-readable storage media of claim 39, wherein thescheduling comprises preferentially scheduling a first service tocomplete before a second service based on expected usage of a particularresource by the first service and the second service.